There are 3 datasets that provide data on the stations, trips, and weather from 2014-2016.

Station dataset

station_id: station ID number
name: name of station
lat: station latitude
long: station longitude
install_date: date that station was placed in service
install_dockcount: number of docks at each station on the installation date
modification_date: date that station was modified, resulting in a change in location or dock count
current_dockcount: number of docks at each station on 8/31/2016
decommission_date: date that station was placed out of service

Trip dataset

trip_id: numeric ID of bike trip taken
starttime: day and time trip started, in PST
stoptime: day and time trip ended, in PST
bikeid: ID attached to each bike
tripduration: time of trip in seconds
from_station_name: name of station where trip originated
to_station_name: name of station where trip terminated
from_station_id: ID of station where trip originated
to_station_id: ID of station where trip terminated
usertype: "Short-Term Pass Holder" is a rider who purchased a 24-Hour or 3-Day Pass; "Member" is a rider who purchased a Monthly or an Annual Membership
gender: gender of rider
birthyear: birth year of rider

Weather dataset contains daily weather information in the service area

1. Import all sets into a dictionary and correct any errors

The trip file had the headers repeated after the values of a line, I simply got rid of them cancelling the values from the file. I also noticed that the first several rows were repeated and the line with the headers was one of those, so I used the values from the original line to fill in the missing ones.



In [1]:

    
import pandas as pd
import numpy as np

sets = ['station', 'trip', 'weather']

cycle = {}

for s in sets:
    cycle[s] = pd.read_csv('cycle_share/' + s + '.csv')



In [2]:

    
cycle['trip'].head()









    Out[2]:






  
    
      
      trip_id
      starttime
      stoptime
      bikeid
      tripduration
      from_station_name
      to_station_name
      from_station_id
      to_station_id
      usertype
      gender
      birthyear
    
  
  
    
      0
      431
      10/13/2014 10:31
      10/13/2014 10:48
      SEA00298
      985.935
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      Male
      1960.0
    
    
      1
      432
      10/13/2014 10:32
      10/13/2014 10:48
      SEA00195
      926.375
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      Male
      1970.0
    
    
      2
      433
      10/13/2014 10:33
      10/13/2014 10:48
      SEA00486
      883.831
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      Female
      1988.0
    
    
      3
      434
      10/13/2014 10:34
      10/13/2014 10:48
      SEA00333
      865.937
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      Female
      1977.0
    
    
      4
      435
      10/13/2014 10:34
      10/13/2014 10:49
      SEA00202
      923.923
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      Male
      1971.0

2. Print data summaries including the number of null values. Should we drop or try to correct any of the null values?



In [3]:

    
for df in cycle:
    print(df)
    print(cycle[df].describe(include='all'))
    print('\n')









    



station
       station_id                   name        lat        long install_date  \
count          58                     58  58.000000   58.000000           58   
unique         58                     58        NaN         NaN            9   
top         PS-04  9th Ave N & Mercer St        NaN         NaN   10/13/2014   
freq            1                      1        NaN         NaN           50   
mean          NaN                    NaN  47.624796 -122.327242          NaN   
std           NaN                    NaN   0.019066    0.014957          NaN   
min           NaN                    NaN  47.598488 -122.355230          NaN   
25%           NaN                    NaN  47.613239 -122.338735          NaN   
50%           NaN                    NaN  47.618591 -122.328207          NaN   
75%           NaN                    NaN  47.627712 -122.316691          NaN   
max           NaN                    NaN  47.666145 -122.284119          NaN   

        install_dockcount modification_date  current_dockcount  \
count           58.000000                17          58.000000   
unique                NaN                12                NaN   
top                   NaN         2/20/2015                NaN   
freq                  NaN                 4                NaN   
mean            17.586207               NaN          16.517241   
std              3.060985               NaN           5.117021   
min             12.000000               NaN           0.000000   
25%             16.000000               NaN          16.000000   
50%             18.000000               NaN          18.000000   
75%             18.000000               NaN          18.000000   
max             30.000000               NaN          26.000000   

       decommission_date  
count                  4  
unique                 4  
top           10/29/2015  
freq                   1  
mean                 NaN  
std                  NaN  
min                  NaN  
25%                  NaN  
50%                  NaN  
75%                  NaN  
max                  NaN  


trip
              trip_id         starttime          stoptime    bikeid  \
count   286858.000000            286858            286858    286858   
unique            NaN            176216            169285       493   
top               NaN  10/13/2014 11:51  10/13/2014 11:51  SEA00281   
freq              NaN                18                18       835   
mean    112431.781746               NaN               NaN       NaN   
std      76565.086482               NaN               NaN       NaN   
min        431.000000               NaN               NaN       NaN   
25%      43051.000000               NaN               NaN       NaN   
50%     103486.500000               NaN               NaN       NaN   
75%     179544.750000               NaN               NaN       NaN   
max     255245.000000               NaN               NaN       NaN   

         tripduration                from_station_name    to_station_name  \
count   286858.000000                           286858             286858   
unique            NaN                               61                 61   
top               NaN  Pier 69 / Alaskan Way & Clay St  2nd Ave & Pine St   
freq              NaN                            13054              13784   
mean      1178.354284                              NaN                NaN   
std       2038.697070                              NaN                NaN   
min         60.008000                              NaN                NaN   
25%        387.925750                              NaN                NaN   
50%        624.846500                              NaN                NaN   
75%       1118.483250                              NaN                NaN   
max      28794.398000                              NaN                NaN   

       from_station_id to_station_id usertype  gender      birthyear  
count           286858        286858   286858  181557  181553.000000  
unique              61            61        2       3            NaN  
top              WF-01        CBD-13   Member    Male            NaN  
freq             13054         13784   181557  140564            NaN  
mean               NaN           NaN      NaN     NaN    1979.759062  
std                NaN           NaN      NaN     NaN      10.167119  
min                NaN           NaN      NaN     NaN    1931.000000  
25%                NaN           NaN      NaN     NaN    1974.000000  
50%                NaN           NaN      NaN     NaN    1983.000000  
75%                NaN           NaN      NaN     NaN    1987.000000  
max                NaN           NaN      NaN     NaN    1999.000000  


weather
             Date  Max_Temperature_F  Mean_Temperature_F  Min_TemperatureF  \
count         689         689.000000          688.000000        689.000000   
unique        689                NaN                 NaN               NaN   
top     4/22/2016                NaN                 NaN               NaN   
freq            1                NaN                 NaN               NaN   
mean          NaN          64.027576           56.584302         49.454282   
std           NaN          12.427843           10.408058          9.451437   
min           NaN          39.000000           33.000000         23.000000   
25%           NaN          55.000000           48.000000         43.000000   
50%           NaN          63.000000           56.000000         50.000000   
75%           NaN          73.000000           65.000000         57.000000   
max           NaN          98.000000           83.000000         70.000000   

        Max_Dew_Point_F  MeanDew_Point_F  Min_Dewpoint_F  Max_Humidity  \
count        689.000000       689.000000      689.000000    689.000000   
unique              NaN              NaN             NaN           NaN   
top                 NaN              NaN             NaN           NaN   
freq                NaN              NaN             NaN           NaN   
mean          48.571843        45.021771       40.873730     84.541364   
std            7.501230         7.914025        8.854608      9.718948   
min           10.000000         4.000000        1.000000     40.000000   
25%           44.000000        41.000000       36.000000     78.000000   
50%           50.000000        46.000000       42.000000     86.000000   
75%           54.000000        51.000000       47.000000     90.000000   
max           77.000000        59.000000       57.000000    100.000000   

        Mean_Humidity  Min_Humidity   ...    Mean_Sea_Level_Pressure_In  \
count      689.000000    689.000000   ...                    689.000000   
unique            NaN           NaN   ...                           NaN   
top               NaN           NaN   ...                           NaN   
freq              NaN           NaN   ...                           NaN   
mean        68.506531     49.973875   ...                     30.034761   
std         12.701871     15.825701   ...                      0.197503   
min         24.000000     15.000000   ...                     29.310000   
25%         60.000000     38.000000   ...                     29.930000   
50%         70.000000     50.000000   ...                     30.040000   
75%         79.000000     63.000000   ...                     30.160000   
max         95.000000     87.000000   ...                     30.810000   

        Min_Sea_Level_Pressure_In  Max_Visibility_Miles  \
count                  689.000000            689.000000   
unique                        NaN                   NaN   
top                           NaN                   NaN   
freq                          NaN                   NaN   
mean                    29.940610              9.989840   
std                      0.221803              0.266679   
min                     29.140000              3.000000   
25%                     29.840000             10.000000   
50%                     29.960000             10.000000   
75%                     30.080000             10.000000   
max                     30.750000             10.000000   

        Mean_Visibility_Miles  Min_Visibility_Miles  Max_Wind_Speed_MPH  \
count              689.000000            689.000000          689.000000   
unique                    NaN                   NaN                 NaN   
top                       NaN                   NaN                 NaN   
freq                      NaN                   NaN                 NaN   
mean                 9.429608              7.245283           11.085631   
std                  1.174360              3.281278            3.921087   
min                  1.000000              0.000000            4.000000   
25%                  9.000000              4.000000            8.000000   
50%                 10.000000              9.000000           10.000000   
75%                 10.000000             10.000000           13.000000   
max                 10.000000             10.000000           30.000000   

        Mean_Wind_Speed_MPH  Max_Gust_Speed_MPH Precipitation_In  Events  
count             689.00000                 504       689.000000     328  
unique                  NaN                  25              NaN       9  
top                     NaN                   -              NaN    Rain  
freq                    NaN                 225              NaN     287  
mean                4.63135                 NaN         0.105065     NaN  
std                 2.78032                 NaN         0.235644     NaN  
min                 0.00000                 NaN         0.000000     NaN  
25%                 3.00000                 NaN         0.000000     NaN  
50%                 4.00000                 NaN         0.000000     NaN  
75%                 6.00000                 NaN         0.090000     NaN  
max                23.00000                 NaN         2.200000     NaN  

[11 rows x 21 columns]

Gender and year of birth have nulls, I don't think we should drop them because we would lose over 100000 rows; instead we could use the median or mean to replace nulls for the year of birth. Regarding the gender it's not possible to make any replacement, but it should be noted that most of the entries are male.



In [5]:

    
cycle['trip'].groupby('gender')['trip_id'].count()









    Out[5]:





gender
Female     37562
Male      140564
Other       3431
Name: trip_id, dtype: int64

3. Create a column in the trip table that contains only the date (no time)



In [39]:

    
#cycle['trip']['date'] = cycle['trip']['starttime'].apply(lambda x: pd.to_datetime(x[0:x.find(' ')], format='%m/%d/%Y'))
cycle['trip']['date'] = cycle['trip']['starttime'].apply(lambda x: x[0:x.find(' ')])
cycle['trip'].head()









    Out[39]:






  
    
      
      trip_id
      starttime
      stoptime
      bikeid
      tripduration
      from_station_name
      to_station_name
      from_station_id
      to_station_id
      usertype
      gender
      birthyear
      date
    
  
  
    
      0
      431
      10/13/2014 10:31
      10/13/2014 10:48
      SEA00298
      985.935
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      Male
      1960.0
      10/13/2014
    
    
      1
      432
      10/13/2014 10:32
      10/13/2014 10:48
      SEA00195
      926.375
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      Male
      1970.0
      10/13/2014
    
    
      2
      433
      10/13/2014 10:33
      10/13/2014 10:48
      SEA00486
      883.831
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      Female
      1988.0
      10/13/2014
    
    
      3
      434
      10/13/2014 10:34
      10/13/2014 10:48
      SEA00333
      865.937
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      Female
      1977.0
      10/13/2014
    
    
      4
      435
      10/13/2014 10:34
      10/13/2014 10:49
      SEA00202
      923.923
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      Male
      1971.0
      10/13/2014

4. Merge weather data with trip data and be sure not to lose any trip data



In [40]:

    
trip_weather = pd.merge(cycle['trip'], cycle['weather'], left_on='date', right_on='Date', how='left')
trip_weather.head()









    Out[40]:






  
    
      
      trip_id
      starttime
      stoptime
      bikeid
      tripduration
      from_station_name
      to_station_name
      from_station_id
      to_station_id
      usertype
      ...
      Mean_Sea_Level_Pressure_In
      Min_Sea_Level_Pressure_In
      Max_Visibility_Miles
      Mean_Visibility_Miles
      Min_Visibility_Miles
      Max_Wind_Speed_MPH
      Mean_Wind_Speed_MPH
      Max_Gust_Speed_MPH
      Precipitation_In
      Events
    
  
  
    
      0
      431
      10/13/2014 10:31
      10/13/2014 10:48
      SEA00298
      985.935
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      ...
      29.79
      29.65
      10
      10
      4
      13
      4
      21
      0.0
      Rain
    
    
      1
      432
      10/13/2014 10:32
      10/13/2014 10:48
      SEA00195
      926.375
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      ...
      29.79
      29.65
      10
      10
      4
      13
      4
      21
      0.0
      Rain
    
    
      2
      433
      10/13/2014 10:33
      10/13/2014 10:48
      SEA00486
      883.831
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      ...
      29.79
      29.65
      10
      10
      4
      13
      4
      21
      0.0
      Rain
    
    
      3
      434
      10/13/2014 10:34
      10/13/2014 10:48
      SEA00333
      865.937
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      ...
      29.79
      29.65
      10
      10
      4
      13
      4
      21
      0.0
      Rain
    
    
      4
      435
      10/13/2014 10:34
      10/13/2014 10:49
      SEA00202
      923.923
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      ...
      29.79
      29.65
      10
      10
      4
      13
      4
      21
      0.0
      Rain
    
  

5 rows × 34 columns

5. Drop records that are completely duplicated (all values). Check for and inspect any duplicate trip_id values that remain. Remove if they exist.



In [68]:

    
print(len(trip_weather))
trip_weather = trip_weather.drop_duplicates()
print(len(trip_weather))



In [72]:

    
print(len(trip_weather['trip_id']))
print(len(trip_weather['trip_id'].unique()))

6. Create columns for lat & long values for the from- and to- stations



In [80]:

    
trip_weather = pd.merge(trip_weather, cycle['station'][['station_id', 'lat', 'long']], left_on='from_station_id', right_on='station_id', how='left').drop('station_id', axis=1)
trip_weather = pd.merge(trip_weather, cycle['station'][['station_id', 'lat', 'long']], left_on='to_station_id', right_on='station_id', how='left', suffixes=['_from_station', '_to_station']).drop('station_id', axis=1)
trip_weather.head()









    Out[80]:






  
    
      
      trip_id
      starttime
      stoptime
      bikeid
      tripduration
      from_station_name
      to_station_name
      from_station_id
      to_station_id
      usertype
      ...
      Min_Visibility_Miles
      Max_Wind_Speed_MPH
      Mean_Wind_Speed_MPH
      Max_Gust_Speed_MPH
      Precipitation_In
      Events
      lat_from_station
      long_from_station
      lat_to_station
      long_to_station
    
  
  
    
      0
      431
      10/13/2014 10:31
      10/13/2014 10:48
      SEA00298
      985.935
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      ...
      4
      13
      4
      21
      0.0
      Rain
      47.60595
      -122.335768
      47.600757
      -122.332946
    
    
      1
      432
      10/13/2014 10:32
      10/13/2014 10:48
      SEA00195
      926.375
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      ...
      4
      13
      4
      21
      0.0
      Rain
      47.60595
      -122.335768
      47.600757
      -122.332946
    
    
      2
      433
      10/13/2014 10:33
      10/13/2014 10:48
      SEA00486
      883.831
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      ...
      4
      13
      4
      21
      0.0
      Rain
      47.60595
      -122.335768
      47.600757
      -122.332946
    
    
      3
      434
      10/13/2014 10:34
      10/13/2014 10:48
      SEA00333
      865.937
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      ...
      4
      13
      4
      21
      0.0
      Rain
      47.60595
      -122.335768
      47.600757
      -122.332946
    
    
      4
      435
      10/13/2014 10:34
      10/13/2014 10:49
      SEA00202
      923.923
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Member
      ...
      4
      13
      4
      21
      0.0
      Rain
      47.60595
      -122.335768
      47.600757
      -122.332946
    
  

5 rows × 38 columns

7. Write a function to round all `tripduration` values to the nearest half second increment and then round all the values in the data



In [129]:

    
def round_trips(duration):
    roundings = np.array([np.floor(duration), np.floor(duration)+0.5, np.ceil(duration)])
    return roundings[np.argmin(np.abs(duration - roundings))]

trip_weather['tripduration'] = trip_weather['tripduration'].apply(round_trips)



In [130]:

    
trip_weather['tripduration'].head(10)









    Out[130]:





0    986.0
1    926.5
2    884.0
3    866.0
4    924.0
5    809.0
6    596.5
7    592.0
8    586.5
9    587.5
Name: tripduration, dtype: float64

8. Verify that `trip_duration` matches the timestamps to within 60 seconds



In [140]:

    
trip_weather[np.abs(((pd.to_datetime(trip_weather['stoptime']) - pd.to_datetime(trip_weather['starttime'])) / np.timedelta64(1, 's')) - trip_weather['tripduration']) > 60]









    Out[140]:






  
    
      
      trip_id
      starttime
      stoptime
      bikeid
      tripduration
      from_station_name
      to_station_name
      from_station_id
      to_station_id
      usertype
      ...
      Min_Visibility_Miles
      Max_Wind_Speed_MPH
      Mean_Wind_Speed_MPH
      Max_Gust_Speed_MPH
      Precipitation_In
      Events
      lat_from_station
      long_from_station
      lat_to_station
      long_to_station
    
  
  
    
      7040
      8660
      11/2/2014 1:29
      11/2/2014 1:12
      SEA00384
      2571.0
      Pine St & 9th Ave
      Westlake Ave & 6th Ave
      SLU-16
      SLU-15
      Short-Term Pass Holder
      ...
      4
      9
      5
      -
      0.11
      Rain
      47.613715
      -122.331777
      47.613628
      -122.337341
    
    
      7041
      8661
      11/2/2014 1:29
      11/2/2014 1:35
      SEA00371
      3986.0
      Cal Anderson Park / 11th Ave & Pine St
      Cal Anderson Park / 11th Ave & Pine St
      CH-08
      CH-08
      Short-Term Pass Holder
      ...
      4
      9
      5
      -
      0.11
      Rain
      47.615486
      -122.318245
      47.615486
      -122.318245
    
    
      7042
      8662
      11/2/2014 1:29
      11/2/2014 1:35
      SEA00170
      3978.5
      Cal Anderson Park / 11th Ave & Pine St
      Cal Anderson Park / 11th Ave & Pine St
      CH-08
      CH-08
      Short-Term Pass Holder
      ...
      4
      9
      5
      -
      0.11
      Rain
      47.615486
      -122.318245
      47.615486
      -122.318245
    
    
      7043
      8663
      11/2/2014 1:29
      11/2/2014 1:11
      SEA00205
      2513.5
      Pine St & 9th Ave
      Westlake Ave & 6th Ave
      SLU-16
      SLU-15
      Short-Term Pass Holder
      ...
      4
      9
      5
      -
      0.11
      Rain
      47.613715
      -122.331777
      47.613628
      -122.337341
    
    
      7044
      8666
      11/2/2014 1:31
      11/2/2014 1:11
      SEA00430
      2398.5
      Pine St & 9th Ave
      Westlake Ave & 6th Ave
      SLU-16
      SLU-15
      Short-Term Pass Holder
      ...
      4
      9
      5
      -
      0.11
      Rain
      47.613715
      -122.331777
      47.613628
      -122.337341
    
    
      7045
      8667
      11/2/2014 1:37
      11/2/2014 1:12
      SEA00112
      2074.0
      Pine St & 9th Ave
      Westlake Ave & 6th Ave
      SLU-16
      SLU-15
      Short-Term Pass Holder
      ...
      4
      9
      5
      -
      0.11
      Rain
      47.613715
      -122.331777
      47.613628
      -122.337341
    
    
      7046
      8669
      11/2/2014 1:44
      11/2/2014 1:04
      SEA00247
      1201.5
      2nd Ave & Vine St
      Key Arena / 1st Ave N & Harrison St
      BT-03
      SLU-19
      Member
      ...
      4
      9
      5
      -
      0.11
      Rain
      47.615829
      -122.348564
      47.622277
      -122.355230
    
    
      7047
      8670
      11/2/2014 1:52
      11/2/2014 1:07
      SEA00460
      919.0
      2nd Ave & Vine St
      Summit Ave & E Denny Way
      BT-03
      CH-01
      Member
      ...
      4
      9
      5
      -
      0.11
      Rain
      47.615829
      -122.348564
      47.618633
      -122.325249
    
    
      7048
      8672
      11/2/2014 1:59
      11/2/2014 1:25
      SEA00481
      1565.0
      UW Magnuson Health Sciences Center Rotunda / C...
      Children's Hospital / Sandpoint Way NE & 40th ...
      UW-10
      DPD-03
      Short-Term Pass Holder
      ...
      4
      9
      5
      -
      0.11
      Rain
      47.650725
      -122.311188
      47.663509
      -122.284119
    
  

9 rows × 38 columns

9.Something is wrong with the `Max_Gust_Speed_MPH` column. Identify and correct the problem, then save the data.



In [154]:

    
# not an int, let's convert it
trip_weather['Max_Gust_Speed_MPH'] = trip_weather['Max_Gust_Speed_MPH'].replace('-', np.NaN).astype('float')



In [155]:

    
trip_weather['Max_Gust_Speed_MPH'].describe()









    Out[155]:





count    88509.000000
mean        21.287474
std          5.169371
min         16.000000
25%         18.000000
50%         20.000000
75%         24.000000
max         52.000000
Name: Max_Gust_Speed_MPH, dtype: float64



In [156]:

    
trip_weather.to_csv('cycle_share/trip_weather.csv')

Cleaning: Movies

This data set contains 28 attributes related to various movie titles that have been scraped from IMDb. The set is supposed to contain unique titles for each record, where each record has the following attributes:

"movie_title" "color" "num_critic_for_reviews" "movie_facebook_likes" "duration" "director_name" "director_facebook_likes" "actor_3_name" "actor_3_facebook_likes" "actor_2_name" "actor_2_facebook_likes" "actor_1_name" "actor_1_facebook_likes" "gross" "genres" "num_voted_users" "cast_total_facebook_likes" "facenumber_in_poster" "plot_keywords" "movie_imdb_link" "num_user_for_reviews" "language" "country" "content_rating" "budget" "title_year" "imdb_score" "aspect_ratio"

The original set is available kaggle (here)

1. Check for and correct similar values in `color`, `language`, and `country`



In [157]:

    
movies = pd.read_csv('movies/movies_data.csv')
movies.head()









    Out[157]:






  
    
      
      color
      director_name
      num_critic_for_reviews
      duration
      director_facebook_likes
      actor_3_facebook_likes
      actor_2_name
      actor_1_facebook_likes
      gross
      genres
      ...
      num_user_for_reviews
      language
      country
      content_rating
      budget
      title_year
      actor_2_facebook_likes
      imdb_score
      aspect_ratio
      movie_facebook_likes
    
  
  
    
      0
      Color
      James Cameron
      723.0
      178.0
      0.0
      855.0
      Joel David Moore
      1000.0
      760505847.0
      Action|Adventure|Fantasy|Sci-Fi
      ...
      3054.0
      English
      USA
      PG-13
      237000000.0
      2009.0
      936.0
      7.9
      1.78
      33000
    
    
      1
      Color
      Gore Verbinski
      302.0
      169.0
      563.0
      1000.0
      Orlando Bloom
      40000.0
      309404152.0
      Action|Adventure|Fantasy
      ...
      1238.0
      English
      USA
      PG-13
      300000000.0
      2007.0
      5000.0
      7.1
      2.35
      0
    
    
      2
      Color
      Sam Mendes
      602.0
      148.0
      0.0
      161.0
      Rory Kinnear
      11000.0
      200074175.0
      Action|Adventure|Thriller
      ...
      994.0
      English
      UK
      PG-13
      245000000.0
      2015.0
      393.0
      6.8
      2.35
      85000
    
    
      3
      Color
      Christopher Nolan
      813.0
      164.0
      22000.0
      23000.0
      Christian Bale
      27000.0
      448130642.0
      Action|Thriller
      ...
      2701.0
      English
      USA
      PG-13
      250000000.0
      2012.0
      23000.0
      8.5
      2.35
      164000
    
    
      4
      NaN
      Doug Walker
      NaN
      NaN
      131.0
      NaN
      Rob Walker
      131.0
      NaN
      Documentary
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      12.0
      7.1
      NaN
      0
    
  

5 rows × 28 columns



In [224]:

    
movies.dtypes









    Out[224]:





color                         object
director_name                 object
num_critic_for_reviews       float64
duration                     float64
director_facebook_likes      float64
actor_3_facebook_likes       float64
actor_2_name                  object
actor_1_facebook_likes       float64
gross                        float64
genres                        object
actor_1_name                  object
movie_title                   object
num_voted_users                int64
cast_total_facebook_likes      int64
actor_3_name                  object
facenumber_in_poster         float64
plot_keywords                 object
movie_imdb_link               object
num_user_for_reviews         float64
language                      object
country                       object
content_rating                object
budget                       float64
title_year                   float64
actor_2_facebook_likes       float64
imdb_score                   float64
aspect_ratio                 float64
movie_facebook_likes           int64
dtype: object



In [159]:

    
movies.describe(include='all')









    Out[159]:






  
    
      
      color
      director_name
      num_critic_for_reviews
      duration
      director_facebook_likes
      actor_3_facebook_likes
      actor_2_name
      actor_1_facebook_likes
      gross
      genres
      ...
      num_user_for_reviews
      language
      country
      content_rating
      budget
      title_year
      actor_2_facebook_likes
      imdb_score
      aspect_ratio
      movie_facebook_likes
    
  
  
    
      count
      5024
      4939
      4993.000000
      5028.000000
      4939.000000
      5020.000000
      5030
      5036.000000
      4.159000e+03
      5043
      ...
      5022.000000
      5031
      5038
      4740
      4.551000e+03
      4935.000000
      5030.000000
      5043.000000
      4714.000000
      5043.000000
    
    
      unique
      4
      2411
      NaN
      NaN
      NaN
      NaN
      3040
      NaN
      NaN
      914
      ...
      NaN
      47
      65
      18
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      top
      Color
      Steven Spielberg
      NaN
      NaN
      NaN
      NaN
      Morgan Freeman
      NaN
      NaN
      Drama
      ...
      NaN
      English
      USA
      R
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      freq
      4799
      25
      NaN
      NaN
      NaN
      NaN
      20
      NaN
      NaN
      236
      ...
      NaN
      4704
      3807
      2118
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      mean
      NaN
      NaN
      140.194272
      107.201074
      686.509212
      645.009761
      NaN
      6560.047061
      4.846841e+07
      NaN
      ...
      272.770808
      NaN
      NaN
      NaN
      3.975262e+07
      2002.470517
      1651.754473
      6.442138
      2.220403
      7525.964505
    
    
      std
      NaN
      NaN
      121.601675
      25.197441
      2813.328607
      1665.041728
      NaN
      15020.759120
      6.845299e+07
      NaN
      ...
      377.982886
      NaN
      NaN
      NaN
      2.061149e+08
      12.474599
      4042.438863
      1.125116
      1.385113
      19320.445110
    
    
      min
      NaN
      NaN
      1.000000
      7.000000
      0.000000
      0.000000
      NaN
      0.000000
      1.620000e+02
      NaN
      ...
      1.000000
      NaN
      NaN
      NaN
      2.180000e+02
      1916.000000
      0.000000
      1.600000
      1.180000
      0.000000
    
    
      25%
      NaN
      NaN
      50.000000
      93.000000
      7.000000
      133.000000
      NaN
      614.000000
      5.340988e+06
      NaN
      ...
      65.000000
      NaN
      NaN
      NaN
      6.000000e+06
      1999.000000
      281.000000
      5.800000
      1.850000
      0.000000
    
    
      50%
      NaN
      NaN
      110.000000
      103.000000
      49.000000
      371.500000
      NaN
      988.000000
      2.551750e+07
      NaN
      ...
      156.000000
      NaN
      NaN
      NaN
      2.000000e+07
      2005.000000
      595.000000
      6.600000
      2.350000
      166.000000
    
    
      75%
      NaN
      NaN
      195.000000
      118.000000
      194.500000
      636.000000
      NaN
      11000.000000
      6.230944e+07
      NaN
      ...
      326.000000
      NaN
      NaN
      NaN
      4.500000e+07
      2011.000000
      918.000000
      7.200000
      2.350000
      3000.000000
    
    
      max
      NaN
      NaN
      813.000000
      511.000000
      23000.000000
      23000.000000
      NaN
      640000.000000
      7.605058e+08
      NaN
      ...
      5060.000000
      NaN
      NaN
      NaN
      1.221550e+10
      2016.000000
      137000.000000
      9.500000
      16.000000
      349000.000000
    
  

11 rows × 28 columns



In [180]:

    
print(movies['color'].unique())
movies['color'] = movies['color'].apply(lambda x: 'Color' if x == 'color' else 'Black and White' if x == 'black and white' else x)
print(movies['color'].unique())









    



['Color' nan 'Black and White']
['Color' nan 'Black and White']

2. Create a function that detects and lists non-numeric columns containing values with leading or trailing whitespace. Remove the whitespace in these columns.



In [210]:

    
def find_spaces(df):
    cols = []
    for index, value in df.dtypes[df.dtypes == 'object'].iteritems():
        if df[index].str.startswith(' ').any() | df[index].str.endswith(' ').any():
            cols.append(index)
    
    return cols

find_spaces(movies)









    Out[210]:





['director_name', 'actor_2_name', 'movie_title']



In [217]:

    
for col in find_spaces(movies):
    movies[col] = movies[col].str.lstrip().str.rstrip()

find_spaces(movies)









    Out[217]:





[]

3. Remove duplicate records. Inspect any remaining duplicate movie titles.



In [221]:

    
print(len(movies))
movies = movies.drop_duplicates()
print(len(movies))



In [252]:

    
title_duplicates = list(movies['movie_title'].value_counts()[movies['movie_title'].value_counts() > 1].index)

movies[movies['movie_title'].isin(title_duplicates)].sort_values(by='movie_title')









    Out[252]:






  
    
      
      color
      director_name
      num_critic_for_reviews
      duration
      director_facebook_likes
      actor_3_facebook_likes
      actor_2_name
      actor_1_facebook_likes
      gross
      genres
      ...
      num_user_for_reviews
      language
      country
      content_rating
      budget
      title_year
      actor_2_facebook_likes
      imdb_score
      aspect_ratio
      movie_facebook_likes
    
  
  
    
      4894
      Color
      Richard Fleischer
      69.0
      127.0
      130.0
      51.0
      Robert J. Wilke
      618.0
      NaN
      Adventure|Drama|Family|Fantasy|Sci-Fi
      ...
      108.0
      English
      USA
      Approved
      5000000.0
      1954.0
      53.0
      7.2
      1.37
      0
    
    
      3711
      Color
      Richard Fleischer
      69.0
      127.0
      130.0
      51.0
      Robert J. Wilke
      617.0
      NaN
      Adventure|Drama|Family|Fantasy|Sci-Fi
      ...
      108.0
      English
      USA
      Approved
      5000000.0
      1954.0
      53.0
      7.2
      1.37
      0
    
    
      1420
      Color
      Wes Craven
      256.0
      101.0
      0.0
      574.0
      Lin Shaye
      40000.0
      26505000.0
      Horror
      ...
      668.0
      English
      USA
      X
      1800000.0
      1984.0
      852.0
      7.5
      1.85
      10000
    
    
      4352
      Color
      Wes Craven
      256.0
      101.0
      0.0
      574.0
      Lin Shaye
      40000.0
      26505000.0
      Horror
      ...
      668.0
      English
      USA
      X
      1800000.0
      1984.0
      852.0
      7.5
      1.85
      10000
    
    
      1113
      Color
      Julie Taymor
      156.0
      133.0
      278.0
      107.0
      T.V. Carpio
      5000.0
      24343673.0
      Drama|Fantasy|Musical|Romance
      ...
      524.0
      English
      USA
      PG-13
      45000000.0
      2007.0
      117.0
      7.4
      2.35
      14000
    
    
      4842
      Color
      Julie Taymor
      156.0
      133.0
      278.0
      107.0
      T.V. Carpio
      5000.0
      24343673.0
      Drama|Fantasy|Musical|Romance
      ...
      524.0
      English
      USA
      PG-13
      45000000.0
      2007.0
      117.0
      7.4
      2.35
      14000
    
    
      4128
      Color
      Tim Burton
      451.0
      108.0
      13000.0
      11000.0
      Alan Rickman
      40000.0
      334185206.0
      Adventure|Family|Fantasy
      ...
      736.0
      English
      USA
      PG
      200000000.0
      2010.0
      25000.0
      6.5
      1.85
      24000
    
    
      33
      Color
      Tim Burton
      451.0
      108.0
      13000.0
      11000.0
      Alan Rickman
      40000.0
      334185206.0
      Adventure|Family|Fantasy
      ...
      736.0
      English
      USA
      PG
      200000000.0
      2010.0
      25000.0
      6.5
      1.85
      24000
    
    
      1389
      Color
      Cameron Crowe
      138.0
      105.0
      488.0
      13000.0
      Bradley Cooper
      15000.0
      20991497.0
      Comedy|Drama|Romance
      ...
      172.0
      English
      USA
      PG-13
      37000000.0
      2015.0
      14000.0
      5.4
      1.85
      11000
    
    
      2639
      Color
      Cameron Crowe
      138.0
      105.0
      488.0
      13000.0
      Bradley Cooper
      15000.0
      20991497.0
      Comedy|Drama|Romance
      ...
      172.0
      English
      USA
      PG-13
      37000000.0
      2015.0
      14000.0
      5.4
      1.85
      11000
    
    
      3587
      Color
      Frank Coraci
      188.0
      120.0
      153.0
      447.0
      Steve Coogan
      1000.0
      24004159.0
      Action|Adventure|Comedy
      ...
      191.0
      English
      USA
      PG
      110000000.0
      2004.0
      1000.0
      5.8
      2.35
      0
    
    
      271
      Color
      Frank Coraci
      188.0
      120.0
      153.0
      447.0
      Steve Coogan
      1000.0
      24004159.0
      Action|Adventure|Comedy
      ...
      191.0
      English
      USA
      PG
      110000000.0
      2004.0
      1000.0
      5.8
      2.35
      0
    
    
      2613
      Color
      Timur Bekmambetov
      1.0
      141.0
      335.0
      635.0
      Ayelet Zurer
      11000.0
      NaN
      Adventure|Drama|History
      ...
      1.0
      English
      USA
      PG-13
      100000000.0
      2016.0
      744.0
      6.1
      2.35
      0
    
    
      3967
      Color
      Timur Bekmambetov
      1.0
      141.0
      335.0
      635.0
      Ayelet Zurer
      11000.0
      NaN
      Adventure|Drama|History
      ...
      1.0
      English
      USA
      PG-13
      100000000.0
      2016.0
      744.0
      6.0
      2.35
      0
    
    
      367
      Color
      Timur Bekmambetov
      1.0
      141.0
      335.0
      635.0
      Ayelet Zurer
      11000.0
      NaN
      Adventure|Drama|History
      ...
      1.0
      English
      USA
      PG-13
      NaN
      2016.0
      745.0
      6.1
      2.35
      0
    
    
      2882
      Color
      Jim Sheridan
      217.0
      105.0
      260.0
      1000.0
      Jake Gyllenhaal
      20000.0
      28501651.0
      Drama|Thriller
      ...
      201.0
      English
      USA
      R
      26000000.0
      2009.0
      15000.0
      7.1
      2.35
      10000
    
    
      1852
      Color
      Jim Sheridan
      217.0
      105.0
      260.0
      3000.0
      Jake Gyllenhaal
      20000.0
      28501651.0
      Drama|Thriller
      ...
      201.0
      English
      USA
      R
      26000000.0
      2009.0
      15000.0
      7.1
      2.35
      10000
    
    
      4350
      Color
      Kimberly Peirce
      359.0
      100.0
      108.0
      534.0
      Judy Greer
      17000.0
      35266619.0
      Drama|Fantasy|Horror
      ...
      371.0
      English
      USA
      R
      30000000.0
      2013.0
      2000.0
      5.9
      2.35
      39000
    
    
      1662
      Color
      Kimberly Peirce
      359.0
      100.0
      108.0
      534.0
      Judy Greer
      17000.0
      35266619.0
      Drama|Fantasy|Horror
      ...
      371.0
      English
      USA
      R
      30000000.0
      2013.0
      2000.0
      5.9
      2.35
      39000
    
    
      286
      Black and White
      Martin Campbell
      400.0
      144.0
      258.0
      834.0
      Tobias Menzies
      6000.0
      167007184.0
      Action|Adventure|Thriller
      ...
      2301.0
      English
      UK
      PG-13
      150000000.0
      2006.0
      1000.0
      8.0
      2.35
      0
    
    
      2944
      Black and White
      Martin Campbell
      400.0
      144.0
      258.0
      834.0
      Tobias Menzies
      6000.0
      167007184.0
      Action|Adventure|Thriller
      ...
      2301.0
      English
      UK
      PG-13
      150000000.0
      2006.0
      1000.0
      8.0
      2.35
      0
    
    
      2063
      Color
      Andy Cadiff
      65.0
      101.0
      3.0
      120.0
      Stark Sands
      448.0
      12189514.0
      Comedy|Romance
      ...
      129.0
      English
      USA
      PG-13
      23000000.0
      2004.0
      143.0
      6.1
      2.35
      0
    
    
      1705
      Color
      Andy Cadiff
      65.0
      101.0
      3.0
      120.0
      Stark Sands
      448.0
      12189514.0
      Comedy|Romance
      ...
      129.0
      English
      USA
      PG-13
      23000000.0
      2004.0
      143.0
      6.1
      2.35
      0
    
    
      336
      Color
      Kenneth Branagh
      343.0
      105.0
      0.0
      502.0
      Derek Jacobi
      2000.0
      201148159.0
      Drama|Family|Fantasy|Romance
      ...
      322.0
      English
      USA
      PG
      95000000.0
      2015.0
      520.0
      7.0
      2.35
      56000
    
    
      4141
      Color
      Kenneth Branagh
      343.0
      105.0
      0.0
      502.0
      Derek Jacobi
      2000.0
      201148159.0
      Drama|Family|Fantasy|Romance
      ...
      323.0
      English
      USA
      PG
      95000000.0
      2015.0
      520.0
      7.0
      2.35
      56000
    
    
      213
      Color
      Louis Leterrier
      344.0
      106.0
      255.0
      850.0
      Jason Flemyng
      14000.0
      163192114.0
      Action|Adventure|Fantasy
      ...
      637.0
      English
      USA
      PG-13
      125000000.0
      2010.0
      1000.0
      5.8
      2.35
      15000
    
    
      2650
      Color
      Louis Leterrier
      344.0
      106.0
      255.0
      850.0
      Jason Flemyng
      14000.0
      163192114.0
      Action|Adventure|Fantasy
      ...
      637.0
      English
      USA
      PG-13
      125000000.0
      2010.0
      1000.0
      5.8
      2.35
      15000
    
    
      2193
      Color
      John Milius
      166.0
      129.0
      468.0
      183.0
      Mako
      919.0
      37567440.0
      Adventure|Fantasy
      ...
      337.0
      English
      USA
      R
      20000000.0
      1982.0
      691.0
      6.9
      2.35
      0
    
    
      390
      Color
      John Milius
      166.0
      129.0
      468.0
      183.0
      Mako
      919.0
      37567440.0
      Adventure|Fantasy
      ...
      337.0
      English
      USA
      R
      20000000.0
      1982.0
      691.0
      6.9
      2.35
      0
    
    
      430
      Color
      George A. Romero
      127.0
      130.0
      0.0
      602.0
      Hal Holbrook
      875.0
      NaN
      Comedy|Fantasy|Horror
      ...
      211.0
      English
      USA
      R
      8000000.0
      1982.0
      826.0
      6.9
      1.85
      0
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2105
      Color
      Michael Bay
      257.0
      136.0
      0.0
      3000.0
      Steve Buscemi
      19000.0
      35799026.0
      Action|Adventure|Romance|Sci-Fi|Thriller
      ...
      899.0
      English
      USA
      PG-13
      126000000.0
      2005.0
      12000.0
      6.9
      2.35
      0
    
    
      280
      Color
      Michael Bay
      257.0
      136.0
      0.0
      3000.0
      Steve Buscemi
      19000.0
      35799026.0
      Action|Adventure|Romance|Sci-Fi|Thriller
      ...
      899.0
      English
      USA
      PG-13
      126000000.0
      2005.0
      12000.0
      6.9
      2.35
      0
    
    
      1805
      Color
      Jon Favreau
      370.0
      106.0
      4000.0
      591.0
      Bill Murray
      19000.0
      362645141.0
      Adventure|Drama|Family|Fantasy
      ...
      398.0
      English
      UK
      PG
      175000000.0
      2016.0
      13000.0
      7.8
      1.85
      65000
    
    
      79
      Color
      Jon Favreau
      370.0
      106.0
      4000.0
      591.0
      Bill Murray
      19000.0
      362645141.0
      Adventure|Drama|Family|Fantasy
      ...
      398.0
      English
      UK
      PG
      175000000.0
      2016.0
      13000.0
      7.8
      1.85
      65000
    
    
      1184
      Color
      John G. Avildsen
      81.0
      126.0
      80.0
      225.0
      William Zabka
      668.0
      90800000.0
      Action|Drama|Family|Sport
      ...
      235.0
      English
      USA
      PG
      8000000.0
      1984.0
      641.0
      7.2
      1.85
      0
    
    
      3351
      Color
      John G. Avildsen
      81.0
      126.0
      80.0
      225.0
      William Zabka
      668.0
      90800000.0
      Action|Drama|Family|Sport
      ...
      235.0
      English
      USA
      PG
      8000000.0
      1984.0
      641.0
      7.2
      1.85
      0
    
    
      2647
      Color
      Dennis Iliadis
      241.0
      114.0
      29.0
      616.0
      Monica Potter
      956.0
      32721635.0
      Crime|Horror|Thriller
      ...
      279.0
      English
      USA
      R
      15000000.0
      2009.0
      878.0
      6.6
      1.85
      0
    
    
      4971
      Color
      Dennis Iliadis
      241.0
      114.0
      29.0
      616.0
      Monica Potter
      956.0
      32721635.0
      Crime|Horror|Thriller
      ...
      279.0
      English
      USA
      R
      15000000.0
      2009.0
      878.0
      6.6
      1.85
      0
    
    
      337
      Color
      Peter Jackson
      308.0
      135.0
      0.0
      310.0
      AJ Michalka
      873.0
      43982842.0
      Drama|Fantasy|Thriller
      ...
      593.0
      English
      USA
      PG-13
      65000000.0
      2009.0
      560.0
      6.7
      2.35
      16000
    
    
      4584
      Color
      Peter Jackson
      308.0
      135.0
      0.0
      310.0
      AJ Michalka
      873.0
      43982842.0
      Drama|Fantasy|Thriller
      ...
      593.0
      English
      USA
      PG-13
      65000000.0
      2009.0
      559.0
      6.7
      2.35
      16000
    
    
      84
      Color
      Roland Joffé
      10.0
      109.0
      596.0
      283.0
      Alice Englert
      622.0
      NaN
      Action|Adventure|Romance|Sci-Fi
      ...
      15.0
      English
      Belgium
      R
      NaN
      2015.0
      525.0
      4.5
      NaN
      677
    
    
      1508
      Color
      Roland Joffé
      10.0
      109.0
      596.0
      282.0
      Alice Englert
      621.0
      NaN
      Action|Adventure|Romance|Sci-Fi
      ...
      15.0
      English
      Belgium
      R
      NaN
      2015.0
      525.0
      4.5
      NaN
      677
    
    
      4150
      Color
      Richard Donner
      154.0
      107.0
      503.0
      108.0
      Patrick Troughton
      264.0
      NaN
      Fantasy|Horror|Mystery
      ...
      310.0
      English
      UK
      R
      2800000.0
      1976.0
      139.0
      7.6
      2.35
      7000
    
    
      1894
      Color
      Richard Donner
      154.0
      107.0
      503.0
      108.0
      Patrick Troughton
      264.0
      NaN
      Fantasy|Horror|Mystery
      ...
      310.0
      English
      UK
      R
      2800000.0
      1976.0
      139.0
      7.6
      2.35
      7000
    
    
      3578
      Color
      Dan O'Bannon
      203.0
      108.0
      228.0
      358.0
      Clu Gulager
      431.0
      NaN
      Comedy|Horror|Sci-Fi
      ...
      300.0
      English
      USA
      R
      4000000.0
      1985.0
      426.0
      7.3
      1.85
      0
    
    
      3896
      Color
      Dan O'Bannon
      203.0
      108.0
      228.0
      358.0
      Clu Gulager
      431.0
      NaN
      Comedy|Horror|Sci-Fi
      ...
      300.0
      English
      USA
      R
      4000000.0
      1985.0
      426.0
      7.3
      1.85
      0
    
    
      4936
      Color
      Tobe Hooper
      277.0
      88.0
      365.0
      177.0
      Edwin Neal
      383.0
      30859000.0
      Horror|Thriller
      ...
      826.0
      English
      USA
      R
      83532.0
      1974.0
      371.0
      7.5
      1.85
      0
    
    
      3278
      Color
      Tobe Hooper
      277.0
      88.0
      365.0
      177.0
      Edwin Neal
      383.0
      30859000.0
      Horror|Thriller
      ...
      826.0
      English
      USA
      R
      83532.0
      1974.0
      371.0
      7.5
      1.85
      0
    
    
      3170
      Color
      Florian Henckel von Donnersmarck
      321.0
      103.0
      207.0
      3000.0
      Angelina Jolie Pitt
      40000.0
      67631157.0
      Action|Romance|Thriller
      ...
      374.0
      English
      USA
      PG-13
      100000000.0
      2010.0
      11000.0
      6.0
      2.35
      25000
    
    
      305
      Color
      Florian Henckel von Donnersmarck
      321.0
      103.0
      207.0
      3000.0
      Angelina Jolie Pitt
      40000.0
      67631157.0
      Action|Romance|Thriller
      ...
      374.0
      English
      USA
      PG-13
      100000000.0
      2010.0
      11000.0
      6.0
      2.35
      25000
    
    
      4681
      Color
      David S. Goyer
      178.0
      89.0
      687.0
      389.0
      Atticus Shaffer
      10000.0
      42638165.0
      Drama|Fantasy|Horror|Mystery|Thriller
      ...
      230.0
      English
      USA
      PG-13
      16000000.0
      2009.0
      787.0
      4.8
      2.35
      0
    
    
      2546
      Color
      David S. Goyer
      178.0
      89.0
      687.0
      389.0
      Atticus Shaffer
      10000.0
      42638165.0
      Drama|Fantasy|Horror|Mystery|Thriller
      ...
      230.0
      English
      USA
      PG-13
      16000000.0
      2009.0
      787.0
      4.8
      2.35
      0
    
    
      4009
      Color
      Akiva Schaffer
      265.0
      102.0
      82.0
      536.0
      Nicholas Braun
      622.0
      34350553.0
      Action|Comedy|Sci-Fi
      ...
      207.0
      English
      USA
      R
      68000000.0
      2012.0
      591.0
      5.7
      2.35
      15000
    
    
      871
      Color
      Akiva Schaffer
      265.0
      102.0
      82.0
      537.0
      Nicholas Braun
      622.0
      34350553.0
      Action|Comedy|Sci-Fi
      ...
      207.0
      English
      USA
      R
      68000000.0
      2012.0
      591.0
      5.7
      2.35
      15000
    
    
      1378
      Color
      Catherine Hardwicke
      350.0
      122.0
      308.0
      10000.0
      Taylor Lautner
      17000.0
      191449475.0
      Drama|Fantasy|Romance
      ...
      1535.0
      English
      USA
      PG-13
      37000000.0
      2008.0
      12000.0
      5.2
      2.35
      19000
    
    
      2262
      Color
      Catherine Hardwicke
      350.0
      122.0
      308.0
      10000.0
      Taylor Lautner
      17000.0
      191449475.0
      Drama|Fantasy|Romance
      ...
      1535.0
      English
      USA
      PG-13
      37000000.0
      2008.0
      12000.0
      5.2
      2.35
      19000
    
    
      3981
      Color
      Jaume Collet-Serra
      349.0
      113.0
      174.0
      767.0
      Frank Langella
      14000.0
      61094903.0
      Action|Mystery|Thriller
      ...
      332.0
      English
      UK
      PG-13
      30000000.0
      2011.0
      902.0
      6.9
      2.35
      29000
    
    
      1232
      Color
      Jaume Collet-Serra
      349.0
      113.0
      174.0
      767.0
      Frank Langella
      14000.0
      61094903.0
      Action|Mystery|Thriller
      ...
      332.0
      English
      UK
      PG-13
      30000000.0
      2011.0
      903.0
      6.9
      2.35
      29000
    
    
      2099
      Color
      Paul McGuigan
      159.0
      110.0
      118.0
      287.0
      Spencer Wilding
      11000.0
      5773519.0
      Drama|Horror|Sci-Fi|Thriller
      ...
      91.0
      English
      USA
      PG-13
      40000000.0
      2015.0
      1000.0
      6.0
      2.35
      11000
    
    
      1146
      Color
      Paul McGuigan
      159.0
      110.0
      118.0
      287.0
      Spencer Wilding
      11000.0
      5773519.0
      Drama|Horror|Sci-Fi|Thriller
      ...
      91.0
      English
      USA
      PG-13
      40000000.0
      2015.0
      1000.0
      6.0
      2.35
      11000
    
  

161 rows × 28 columns



In [256]:

    
print(movies.loc[337])
print(movies.loc[4584])









    



color                                                                    Color
director_name                                                    Peter Jackson
num_critic_for_reviews                                                     308
duration                                                                   135
director_facebook_likes                                                      0
actor_3_facebook_likes                                                     310
actor_2_name                                                       AJ Michalka
actor_1_facebook_likes                                                     873
gross                                                              4.39828e+07
genres                                                  Drama|Fantasy|Thriller
actor_1_name                                                 Michael Imperioli
movie_title                                                   The Lovely Bones
num_voted_users                                                         125109
cast_total_facebook_likes                                                 2370
actor_3_name                                                      Tom McCarthy
facenumber_in_poster                                                         1
plot_keywords                            1970s|afterlife|heaven|pedophile|rape
movie_imdb_link              http://www.imdb.com/title/tt0380510/?ref_=fn_t...
num_user_for_reviews                                                       593
language                                                               English
country                                                                    USA
content_rating                                                           PG-13
budget                                                                 6.5e+07
title_year                                                                2009
actor_2_facebook_likes                                                     560
imdb_score                                                                 6.7
aspect_ratio                                                              2.35
movie_facebook_likes                                                     16000
Name: 337, dtype: object
color                                                                    Color
director_name                                                    Peter Jackson
num_critic_for_reviews                                                     308
duration                                                                   135
director_facebook_likes                                                      0
actor_3_facebook_likes                                                     310
actor_2_name                                                       AJ Michalka
actor_1_facebook_likes                                                     873
gross                                                              4.39828e+07
genres                                                  Drama|Fantasy|Thriller
actor_1_name                                                 Michael Imperioli
movie_title                                                   The Lovely Bones
num_voted_users                                                         125114
cast_total_facebook_likes                                                 2369
actor_3_name                                                      Tom McCarthy
facenumber_in_poster                                                         1
plot_keywords                            1970s|afterlife|heaven|pedophile|rape
movie_imdb_link              http://www.imdb.com/title/tt0380510/?ref_=fn_t...
num_user_for_reviews                                                       593
language                                                               English
country                                                                    USA
content_rating                                                           PG-13
budget                                                                 6.5e+07
title_year                                                                2009
actor_2_facebook_likes                                                     559
imdb_score                                                                 6.7
aspect_ratio                                                              2.35
movie_facebook_likes                                                     16000
Name: 4584, dtype: object

4. Create a function that returns two arrays: one for titles that are truly duplicated, and one for duplicated titles are not the same movie.

hint: do this by comparing the imdb link values



In [269]:

    
true_dup = []
false_dup = []

for title in title_duplicates:
    for index, value in movies[movies['movie_title'] == title]['movie_imdb_link'].value_counts().iteritems():
        if value > 1:
            true_dup.append(title)
        else:
            false_dup.append(title)
            break

print(true_dup)
print(false_dup)









    



['Home', 'Ben-Hur', 'King Kong', 'The Watch', 'Clash of the Titans', 'A Nightmare on Elm Street', '20,000 Leagues Under the Sea', 'Poltergeist', 'The Gambler', 'Planet of the Apes', 'Ghostbusters', 'Spider-Man 3', 'Jack Reacher', 'The Omen', 'Day of the Dead', 'Dredd', 'Murder by Numbers', 'Juno', 'Precious', 'Halloween', 'Skyfall', 'The Fog', 'The Gift', 'The Fast and the Furious', 'Cinderella', 'Glory', 'Creepshow', 'Lolita', "The Astronaut's Wife", 'Casino Royale', 'Mercury Rising', 'The Great Gatsby', 'Syriana', 'Disturbia', 'Unknown', 'Brothers', 'The Jungle Book', 'The Tourist', 'Point Break', 'The Island', 'Pan', 'Aloha', 'The Lovely Bones', 'RoboCop', 'First Blood', 'Twilight', 'Conan the Barbarian', 'The Karate Kid', 'Around the World in 80 Days', 'Goosebumps', 'The Return of the Living Dead', 'Snitch', 'Dodgeball: A True Underdog Story', 'Exodus: Gods and Kings', 'The Day the Earth Stood Still', 'Oz the Great and Powerful', 'The Last House on the Left', 'Dawn of the Dead', 'Sabotage', 'The Unborn', 'Lucky Number Slevin', 'Carrie', 'House of Wax', 'Chasing Liberty', 'The Lovers', 'Heist', 'Teenage Mutant Ninja Turtles', 'The Texas Chain Saw Massacre', 'Alice in Wonderland', 'TRON: Legacy', 'Dekalog', 'Snakes on a Plane', 'Side Effects', 'Victor Frankenstein', 'Across the Universe', 'Eddie the Eagle']
['The Dead Zone', 'Out of the Blue', 'The Host']

5. Alter the names of duplicate titles that are different movies so each is unique. Then drop all duplicate rows based on movie title.



In [299]:

    
movies['movie_title'] = movies.apply(lambda x: x['movie_title'] + ' (' + str(int(x['title_year'])) + ')' if str(x['title_year']) != 'nan' and x['movie_title'] in false_dup else x['movie_title'], axis=1)



In [303]:

    
print(len(movies))
movies = movies.drop_duplicates('movie_title')
print(len(movies))

6. Create a series that ranks actors by proportion of movies they have appeared in



In [322]:

    
actors = movies.groupby(['actor_1_name'])['movie_title'].count()
actors = actors.add(movies.groupby(['actor_2_name'])['movie_title'].count(), fill_value=0)
actors = actors.add(movies.groupby(['actor_3_name'])['movie_title'].count(), fill_value=0)

(actors / len(movies)).sort_values(ascending=False).head(20)









    Out[322]:





Robert De Niro        0.010775
Morgan Freeman        0.008742
Bruce Willis          0.007725
Matt Damon            0.007522
Johnny Depp           0.007319
Steve Buscemi         0.007319
Nicolas Cage          0.006709
Brad Pitt             0.006709
Bill Murray           0.006505
Will Ferrell          0.006505
Liam Neeson           0.006505
Denzel Washington     0.006302
Anthony Hopkins       0.006099
Jim Broadbent         0.005896
J.K. Simmons          0.005896
Harrison Ford         0.005896
Robert Downey Jr.     0.005692
Tom Cruise            0.005692
Tom Hanks             0.005692
Scarlett Johansson    0.005489
Name: movie_title, dtype: float64

7. Create a table that contains the first and last years each actor appeared, and their length of history. Then include columns for the actors proportion and total number of movies.

length is number of years they have appeared in movies



In [353]:

    
actor_years = movies.groupby(['actor_1_name'])['title_year'].aggregate({'min_year_1': np.min, 'max_year_1': np.max})
actor_years = actor_years.add(movies.groupby(['actor_2_name'])['title_year'].aggregate({'min_year_2': np.min, 'max_year_2': np.max}), fill_value=0)
actor_years = actor_years.add(movies.groupby(['actor_3_name'])['title_year'].aggregate({'min_year_3': np.min, 'max_year_3': np.max}), fill_value=0)

actor_years['first_year'] = np.min(actor_years[['min_year_1', 'min_year_2', 'min_year_3']], axis=1)
actor_years['last_year'] = np.max(actor_years[['max_year_1', 'max_year_2', 'max_year_3']], axis=1)

actor_years.drop(['min_year_1', 'min_year_2', 'min_year_3', 'max_year_1', 'max_year_2', 'max_year_3'], axis=1, inplace=True)

actor_years['history_length'] = actor_years['last_year'] - actor_years['first_year']

actor_years['movie_number'] = actors
actor_years['movie_proportion'] = actors / len(movies)

actor_years









    Out[353]:






  
    
      
      first_year
      last_year
      history_length
      movie_number
      movie_proportion
    
  
  
    
      50 Cent
      2005.0
      2015.0
      10.0
      5.0
      0.001016
    
    
      A. Michael Baldwin
      1988.0
      1988.0
      0.0
      1.0
      0.000203
    
    
      A.J. Buckley
      2000.0
      2015.0
      15.0
      5.0
      0.001016
    
    
      A.J. DeLucia
      2015.0
      2015.0
      0.0
      1.0
      0.000203
    
    
      A.J. Langer
      1998.0
      1998.0
      0.0
      1.0
      0.000203
    
    
      AJ Michalka
      2009.0
      2011.0
      2.0
      2.0
      0.000407
    
    
      Aaliyah
      2000.0
      2002.0
      2.0
      2.0
      0.000407
    
    
      Aaron Ashmore
      2004.0
      2015.0
      11.0
      2.0
      0.000407
    
    
      Aaron Hill
      NaN
      NaN
      NaN
      1.0
      0.000203
    
    
      Aaron Hughes
      2007.0
      2007.0
      0.0
      1.0
      0.000203
    
    
      Aaron Kwok
      2016.0
      2016.0
      0.0
      1.0
      0.000203
    
    
      Aaron Stanford
      2000.0
      2003.0
      3.0
      4.0
      0.000813
    
    
      Aaron Staton
      2007.0
      2007.0
      0.0
      1.0
      0.000203
    
    
      Aaron Yoo
      2007.0
      2015.0
      8.0
      5.0
      0.001016
    
    
      Aasheekaa Bathija
      2004.0
      2004.0
      0.0
      1.0
      0.000203
    
    
      Aasif Mandvi
      2008.0
      2012.0
      4.0
      4.0
      0.000813
    
    
      Abbie Cornish
      2007.0
      2014.0
      7.0
      7.0
      0.001423
    
    
      Abby Elliott
      2011.0
      2011.0
      0.0
      1.0
      0.000203
    
    
      Abby Mukiibi Nkaaga
      2006.0
      2006.0
      0.0
      1.0
      0.000203
    
    
      Abhishek Bachchan
      2005.0
      2011.0
      6.0
      3.0
      0.000610
    
    
      Abigail Evans
      2014.0
      2014.0
      0.0
      1.0
      0.000203
    
    
      Abigail Spencer
      2012.0
      2014.0
      2.0
      4.0
      0.000813
    
    
      Abraham Benrubi
      1997.0
      2016.0
      19.0
      3.0
      0.000610
    
    
      Ace Marrero
      2013.0
      2013.0
      0.0
      1.0
      0.000203
    
    
      Adam Alexi-Malle
      1999.0
      1999.0
      0.0
      1.0
      0.000203
    
    
      Adam Arkin
      1981.0
      2012.0
      31.0
      4.0
      0.000813
    
    
      Adam Baldwin
      1980.0
      2005.0
      25.0
      6.0
      0.001220
    
    
      Adam Boyer
      2006.0
      2006.0
      0.0
      1.0
      0.000203
    
    
      Adam Brown
      2012.0
      2014.0
      2.0
      3.0
      0.000610
    
    
      Adam Butcher
      2004.0
      2004.0
      0.0
      1.0
      0.000203
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      Zakes Mokae
      1981.0
      1995.0
      14.0
      2.0
      0.000407
    
    
      Zan Marolt
      2007.0
      2007.0
      0.0
      1.0
      0.000203
    
    
      Zane Holtz
      2003.0
      2003.0
      0.0
      1.0
      0.000203
    
    
      Zayn Malik
      2013.0
      2013.0
      0.0
      1.0
      0.000203
    
    
      Zelda Rubinstein
      1982.0
      1988.0
      6.0
      2.0
      0.000407
    
    
      Zen McGrath
      2014.0
      2014.0
      0.0
      1.0
      0.000203
    
    
      Zena Grey
      2010.0
      2010.0
      0.0
      1.0
      0.000203
    
    
      Zero Mostel
      1966.0
      1966.0
      0.0
      1.0
      0.000203
    
    
      Zhanna Friske
      2004.0
      2004.0
      0.0
      1.0
      0.000203
    
    
      Zheng Xu
      2006.0
      2009.0
      3.0
      2.0
      0.000407
    
    
      Zhengyong Zhang
      2004.0
      2004.0
      0.0
      1.0
      0.000203
    
    
      Zhonghua Chen
      2006.0
      2006.0
      0.0
      1.0
      0.000203
    
    
      Zinedine Soualem
      1996.0
      2010.0
      14.0
      2.0
      0.000407
    
    
      Zoe Kazan
      2008.0
      2015.0
      7.0
      5.0
      0.001016
    
    
      Zoe Levin
      2010.0
      2010.0
      0.0
      1.0
      0.000203
    
    
      Zoe Lister-Jones
      2009.0
      2009.0
      0.0
      1.0
      0.000203
    
    
      Zoe Ventoura
      2014.0
      2014.0
      0.0
      1.0
      0.000203
    
    
      Zoey Deutch
      2013.0
      2016.0
      3.0
      2.0
      0.000407
    
    
      Zohra Segal
      2005.0
      2005.0
      0.0
      1.0
      0.000203
    
    
      Zooey Deschanel
      1999.0
      2011.0
      12.0
      19.0
      0.003863
    
    
      Zoë Bell
      2007.0
      2015.0
      8.0
      3.0
      0.000610
    
    
      Zoë Kravitz
      2007.0
      2016.0
      9.0
      8.0
      0.001626
    
    
      Zoë Poledouris
      2001.0
      2001.0
      0.0
      1.0
      0.000203
    
    
      Zubaida Sahar
      2003.0
      2003.0
      0.0
      1.0
      0.000203
    
    
      Zuhair Haddad
      2014.0
      2014.0
      0.0
      1.0
      0.000203
    
    
      Álex Angulo
      2014.0
      2014.0
      0.0
      1.0
      0.000203
    
    
      Ángela Molina
      2014.0
      2014.0
      0.0
      1.0
      0.000203
    
    
      Émilie Dequenne
      2009.0
      2009.0
      0.0
      1.0
      0.000203
    
    
      Ólafur Darri Ólafsson
      NaN
      NaN
      NaN
      1.0
      0.000203
    
    
      Óscar Jaenada
      2010.0
      2016.0
      6.0
      3.0
      0.000610
    
  

6255 rows × 5 columns

8. Create a column that gives each movie an integer ranking based on gross sales

1 should indicate the highest gross
If more than one movie has equal sales, assign all the lowest rank in the group
The next rank after this group should increase only by 1



In [372]:

    
movies['gross_sales_rank'] = movies['gross'].rank(method='dense', ascending=False, na_option='bottom')
movies[['movie_title', 'gross', 'gross_sales_rank']].sort_values(by='gross_sales_rank').head(20)









    Out[372]:






  
    
      
      movie_title
      gross
      gross_sales_rank
    
  
  
    
      0
      Avatar
      760505847.0
      1.0
    
    
      26
      Titanic
      658672302.0
      2.0
    
    
      29
      Jurassic World
      652177271.0
      3.0
    
    
      17
      The Avengers
      623279547.0
      4.0
    
    
      66
      The Dark Knight
      533316061.0
      5.0
    
    
      240
      Star Wars: Episode I - The Phantom Menace
      474544677.0
      6.0
    
    
      3024
      Star Wars: Episode IV - A New Hope
      460935665.0
      7.0
    
    
      8
      Avengers: Age of Ultron
      458991599.0
      8.0
    
    
      3
      The Dark Knight Rises
      448130642.0
      9.0
    
    
      582
      Shrek 2
      436471036.0
      10.0
    
    
      3080
      E.T. the Extra-Terrestrial
      434949459.0
      11.0
    
    
      186
      The Hunger Games: Catching Fire
      424645577.0
      12.0
    
    
      13
      Pirates of the Caribbean: Dead Man's Chest
      423032628.0
      13.0
    
    
      509
      The Lion King
      422783777.0
      14.0
    
    
      43
      Toy Story 3
      414984497.0
      15.0
    
    
      32
      Iron Man 3
      408992272.0
      16.0
    
    
      439
      The Hunger Games
      407999255.0
      17.0
    
    
      27
      Captain America: Civil War
      407197282.0
      18.0
    
    
      161
      Spider-Man
      403706375.0
      19.0
    
    
      36
      Transformers: Revenge of the Fallen
      402076689.0
      20.0

	trip_id	starttime	stoptime	bikeid	tripduration	from_station_name	to_station_name	from_station_id	to_station_id	usertype	gender	birthyear
0	431	10/13/2014 10:31	10/13/2014 10:48	SEA00298	985.935	2nd Ave & Spring St	Occidental Park / Occidental Ave S & S Washing...	CBD-06	PS-04	Member	Male	1960.0
1	432	10/13/2014 10:32	10/13/2014 10:48	SEA00195	926.375	2nd Ave & Spring St	Occidental Park / Occidental Ave S & S Washing...	CBD-06	PS-04	Member	Male	1970.0
2	433	10/13/2014 10:33	10/13/2014 10:48	SEA00486	883.831	2nd Ave & Spring St	Occidental Park / Occidental Ave S & S Washing...	CBD-06	PS-04	Member	Female	1988.0
3	434	10/13/2014 10:34	10/13/2014 10:48	SEA00333	865.937	2nd Ave & Spring St	Occidental Park / Occidental Ave S & S Washing...	CBD-06	PS-04	Member	Female	1977.0
4	435	10/13/2014 10:34	10/13/2014 10:49	SEA00202	923.923	2nd Ave & Spring St	Occidental Park / Occidental Ave S & S Washing...	CBD-06	PS-04	Member	Male	1971.0

	trip_id	starttime	stoptime	bikeid	tripduration	from_station_name	to_station_name	from_station_id	to_station_id	usertype	...	Min_Visibility_Miles	Max_Wind_Speed_MPH	Mean_Wind_Speed_MPH	Max_Gust_Speed_MPH	Precipitation_In	Events	lat_from_station	long_from_station	lat_to_station	long_to_station
7040	8660	11/2/2014 1:29	11/2/2014 1:12	SEA00384	2571.0	Pine St & 9th Ave	Westlake Ave & 6th Ave	SLU-16	SLU-15	Short-Term Pass Holder	...	4	9	5	-	0.11	Rain	47.613715	-122.331777	47.613628	-122.337341
7041	8661	11/2/2014 1:29	11/2/2014 1:35	SEA00371	3986.0	Cal Anderson Park / 11th Ave & Pine St	Cal Anderson Park / 11th Ave & Pine St	CH-08	CH-08	Short-Term Pass Holder	...	4	9	5	-	0.11	Rain	47.615486	-122.318245	47.615486	-122.318245
7042	8662	11/2/2014 1:29	11/2/2014 1:35	SEA00170	3978.5	Cal Anderson Park / 11th Ave & Pine St	Cal Anderson Park / 11th Ave & Pine St	CH-08	CH-08	Short-Term Pass Holder	...	4	9	5	-	0.11	Rain	47.615486	-122.318245	47.615486	-122.318245
7043	8663	11/2/2014 1:29	11/2/2014 1:11	SEA00205	2513.5	Pine St & 9th Ave	Westlake Ave & 6th Ave	SLU-16	SLU-15	Short-Term Pass Holder	...	4	9	5	-	0.11	Rain	47.613715	-122.331777	47.613628	-122.337341
7044	8666	11/2/2014 1:31	11/2/2014 1:11	SEA00430	2398.5	Pine St & 9th Ave	Westlake Ave & 6th Ave	SLU-16	SLU-15	Short-Term Pass Holder	...	4	9	5	-	0.11	Rain	47.613715	-122.331777	47.613628	-122.337341
7045	8667	11/2/2014 1:37	11/2/2014 1:12	SEA00112	2074.0	Pine St & 9th Ave	Westlake Ave & 6th Ave	SLU-16	SLU-15	Short-Term Pass Holder	...	4	9	5	-	0.11	Rain	47.613715	-122.331777	47.613628	-122.337341
7046	8669	11/2/2014 1:44	11/2/2014 1:04	SEA00247	1201.5	2nd Ave & Vine St	Key Arena / 1st Ave N & Harrison St	BT-03	SLU-19	Member	...	4	9	5	-	0.11	Rain	47.615829	-122.348564	47.622277	-122.355230
7047	8670	11/2/2014 1:52	11/2/2014 1:07	SEA00460	919.0	2nd Ave & Vine St	Summit Ave & E Denny Way	BT-03	CH-01	Member	...	4	9	5	-	0.11	Rain	47.615829	-122.348564	47.618633	-122.325249
7048	8672	11/2/2014 1:59	11/2/2014 1:25	SEA00481	1565.0	UW Magnuson Health Sciences Center Rotunda / C...	Children's Hospital / Sandpoint Way NE & 40th ...	UW-10	DPD-03	Short-Term Pass Holder	...	4	9	5	-	0.11	Rain	47.650725	-122.311188	47.663509	-122.284119

	color	director_name	num_critic_for_reviews	duration	director_facebook_likes	actor_3_facebook_likes	actor_2_name	actor_1_facebook_likes	gross	genres	...	num_user_for_reviews	language	country	content_rating	budget	title_year	actor_2_facebook_likes	imdb_score	aspect_ratio	movie_facebook_likes
0	Color	James Cameron	723.0	178.0	0.0	855.0	Joel David Moore	1000.0	760505847.0	Action\|Adventure\|Fantasy\|Sci-Fi	...	3054.0	English	USA	PG-13	237000000.0	2009.0	936.0	7.9	1.78	33000
1	Color	Gore Verbinski	302.0	169.0	563.0	1000.0	Orlando Bloom	40000.0	309404152.0	Action\|Adventure\|Fantasy	...	1238.0	English	USA	PG-13	300000000.0	2007.0	5000.0	7.1	2.35	0
2	Color	Sam Mendes	602.0	148.0	0.0	161.0	Rory Kinnear	11000.0	200074175.0	Action\|Adventure\|Thriller	...	994.0	English	UK	PG-13	245000000.0	2015.0	393.0	6.8	2.35	85000
3	Color	Christopher Nolan	813.0	164.0	22000.0	23000.0	Christian Bale	27000.0	448130642.0	Action\|Thriller	...	2701.0	English	USA	PG-13	250000000.0	2012.0	23000.0	8.5	2.35	164000
4	NaN	Doug Walker	NaN	NaN	131.0	NaN	Rob Walker	131.0	NaN	Documentary	...	NaN	NaN	NaN	NaN	NaN	NaN	12.0	7.1	NaN	0

	color	director_name	num_critic_for_reviews	duration	director_facebook_likes	actor_3_facebook_likes	actor_2_name	actor_1_facebook_likes	gross	genres	...	num_user_for_reviews	language	country	content_rating	budget	title_year	actor_2_facebook_likes	imdb_score	aspect_ratio	movie_facebook_likes
count	5024	4939	4993.000000	5028.000000	4939.000000	5020.000000	5030	5036.000000	4.159000e+03	5043	...	5022.000000	5031	5038	4740	4.551000e+03	4935.000000	5030.000000	5043.000000	4714.000000	5043.000000
unique	4	2411	NaN	NaN	NaN	NaN	3040	NaN	NaN	914	...	NaN	47	65	18	NaN	NaN	NaN	NaN	NaN	NaN
top	Color	Steven Spielberg	NaN	NaN	NaN	NaN	Morgan Freeman	NaN	NaN	Drama	...	NaN	English	USA	R	NaN	NaN	NaN	NaN	NaN	NaN
freq	4799	25	NaN	NaN	NaN	NaN	20	NaN	NaN	236	...	NaN	4704	3807	2118	NaN	NaN	NaN	NaN	NaN	NaN
mean	NaN	NaN	140.194272	107.201074	686.509212	645.009761	NaN	6560.047061	4.846841e+07	NaN	...	272.770808	NaN	NaN	NaN	3.975262e+07	2002.470517	1651.754473	6.442138	2.220403	7525.964505
std	NaN	NaN	121.601675	25.197441	2813.328607	1665.041728	NaN	15020.759120	6.845299e+07	NaN	...	377.982886	NaN	NaN	NaN	2.061149e+08	12.474599	4042.438863	1.125116	1.385113	19320.445110
min	NaN	NaN	1.000000	7.000000	0.000000	0.000000	NaN	0.000000	1.620000e+02	NaN	...	1.000000	NaN	NaN	NaN	2.180000e+02	1916.000000	0.000000	1.600000	1.180000	0.000000
25%	NaN	NaN	50.000000	93.000000	7.000000	133.000000	NaN	614.000000	5.340988e+06	NaN	...	65.000000	NaN	NaN	NaN	6.000000e+06	1999.000000	281.000000	5.800000	1.850000	0.000000
50%	NaN	NaN	110.000000	103.000000	49.000000	371.500000	NaN	988.000000	2.551750e+07	NaN	...	156.000000	NaN	NaN	NaN	2.000000e+07	2005.000000	595.000000	6.600000	2.350000	166.000000
75%	NaN	NaN	195.000000	118.000000	194.500000	636.000000	NaN	11000.000000	6.230944e+07	NaN	...	326.000000	NaN	NaN	NaN	4.500000e+07	2011.000000	918.000000	7.200000	2.350000	3000.000000
max	NaN	NaN	813.000000	511.000000	23000.000000	23000.000000	NaN	640000.000000	7.605058e+08	NaN	...	5060.000000	NaN	NaN	NaN	1.221550e+10	2016.000000	137000.000000	9.500000	16.000000	349000.000000

	color	director_name	num_critic_for_reviews	duration	director_facebook_likes	actor_3_facebook_likes	actor_2_name	actor_1_facebook_likes	gross	genres	...	num_user_for_reviews	language	country	content_rating	budget	title_year	actor_2_facebook_likes	imdb_score	aspect_ratio	movie_facebook_likes
4894	Color	Richard Fleischer	69.0	127.0	130.0	51.0	Robert J. Wilke	618.0	NaN	Adventure\|Drama\|Family\|Fantasy\|Sci-Fi	...	108.0	English	USA	Approved	5000000.0	1954.0	53.0	7.2	1.37	0
3711	Color	Richard Fleischer	69.0	127.0	130.0	51.0	Robert J. Wilke	617.0	NaN	Adventure\|Drama\|Family\|Fantasy\|Sci-Fi	...	108.0	English	USA	Approved	5000000.0	1954.0	53.0	7.2	1.37	0
1420	Color	Wes Craven	256.0	101.0	0.0	574.0	Lin Shaye	40000.0	26505000.0	Horror	...	668.0	English	USA	X	1800000.0	1984.0	852.0	7.5	1.85	10000
4352	Color	Wes Craven	256.0	101.0	0.0	574.0	Lin Shaye	40000.0	26505000.0	Horror	...	668.0	English	USA	X	1800000.0	1984.0	852.0	7.5	1.85	10000
1113	Color	Julie Taymor	156.0	133.0	278.0	107.0	T.V. Carpio	5000.0	24343673.0	Drama\|Fantasy\|Musical\|Romance	...	524.0	English	USA	PG-13	45000000.0	2007.0	117.0	7.4	2.35	14000
4842	Color	Julie Taymor	156.0	133.0	278.0	107.0	T.V. Carpio	5000.0	24343673.0	Drama\|Fantasy\|Musical\|Romance	...	524.0	English	USA	PG-13	45000000.0	2007.0	117.0	7.4	2.35	14000
4128	Color	Tim Burton	451.0	108.0	13000.0	11000.0	Alan Rickman	40000.0	334185206.0	Adventure\|Family\|Fantasy	...	736.0	English	USA	PG	200000000.0	2010.0	25000.0	6.5	1.85	24000
33	Color	Tim Burton	451.0	108.0	13000.0	11000.0	Alan Rickman	40000.0	334185206.0	Adventure\|Family\|Fantasy	...	736.0	English	USA	PG	200000000.0	2010.0	25000.0	6.5	1.85	24000
1389	Color	Cameron Crowe	138.0	105.0	488.0	13000.0	Bradley Cooper	15000.0	20991497.0	Comedy\|Drama\|Romance	...	172.0	English	USA	PG-13	37000000.0	2015.0	14000.0	5.4	1.85	11000
2639	Color	Cameron Crowe	138.0	105.0	488.0	13000.0	Bradley Cooper	15000.0	20991497.0	Comedy\|Drama\|Romance	...	172.0	English	USA	PG-13	37000000.0	2015.0	14000.0	5.4	1.85	11000
3587	Color	Frank Coraci	188.0	120.0	153.0	447.0	Steve Coogan	1000.0	24004159.0	Action\|Adventure\|Comedy	...	191.0	English	USA	PG	110000000.0	2004.0	1000.0	5.8	2.35	0
271	Color	Frank Coraci	188.0	120.0	153.0	447.0	Steve Coogan	1000.0	24004159.0	Action\|Adventure\|Comedy	...	191.0	English	USA	PG	110000000.0	2004.0	1000.0	5.8	2.35	0
2613	Color	Timur Bekmambetov	1.0	141.0	335.0	635.0	Ayelet Zurer	11000.0	NaN	Adventure\|Drama\|History	...	1.0	English	USA	PG-13	100000000.0	2016.0	744.0	6.1	2.35	0
3967	Color	Timur Bekmambetov	1.0	141.0	335.0	635.0	Ayelet Zurer	11000.0	NaN	Adventure\|Drama\|History	...	1.0	English	USA	PG-13	100000000.0	2016.0	744.0	6.0	2.35	0
367	Color	Timur Bekmambetov	1.0	141.0	335.0	635.0	Ayelet Zurer	11000.0	NaN	Adventure\|Drama\|History	...	1.0	English	USA	PG-13	NaN	2016.0	745.0	6.1	2.35	0
2882	Color	Jim Sheridan	217.0	105.0	260.0	1000.0	Jake Gyllenhaal	20000.0	28501651.0	Drama\|Thriller	...	201.0	English	USA	R	26000000.0	2009.0	15000.0	7.1	2.35	10000
1852	Color	Jim Sheridan	217.0	105.0	260.0	3000.0	Jake Gyllenhaal	20000.0	28501651.0	Drama\|Thriller	...	201.0	English	USA	R	26000000.0	2009.0	15000.0	7.1	2.35	10000
4350	Color	Kimberly Peirce	359.0	100.0	108.0	534.0	Judy Greer	17000.0	35266619.0	Drama\|Fantasy\|Horror	...	371.0	English	USA	R	30000000.0	2013.0	2000.0	5.9	2.35	39000
1662	Color	Kimberly Peirce	359.0	100.0	108.0	534.0	Judy Greer	17000.0	35266619.0	Drama\|Fantasy\|Horror	...	371.0	English	USA	R	30000000.0	2013.0	2000.0	5.9	2.35	39000
286	Black and White	Martin Campbell	400.0	144.0	258.0	834.0	Tobias Menzies	6000.0	167007184.0	Action\|Adventure\|Thriller	...	2301.0	English	UK	PG-13	150000000.0	2006.0	1000.0	8.0	2.35	0
2944	Black and White	Martin Campbell	400.0	144.0	258.0	834.0	Tobias Menzies	6000.0	167007184.0	Action\|Adventure\|Thriller	...	2301.0	English	UK	PG-13	150000000.0	2006.0	1000.0	8.0	2.35	0
2063	Color	Andy Cadiff	65.0	101.0	3.0	120.0	Stark Sands	448.0	12189514.0	Comedy\|Romance	...	129.0	English	USA	PG-13	23000000.0	2004.0	143.0	6.1	2.35	0
1705	Color	Andy Cadiff	65.0	101.0	3.0	120.0	Stark Sands	448.0	12189514.0	Comedy\|Romance	...	129.0	English	USA	PG-13	23000000.0	2004.0	143.0	6.1	2.35	0
336	Color	Kenneth Branagh	343.0	105.0	0.0	502.0	Derek Jacobi	2000.0	201148159.0	Drama\|Family\|Fantasy\|Romance	...	322.0	English	USA	PG	95000000.0	2015.0	520.0	7.0	2.35	56000
4141	Color	Kenneth Branagh	343.0	105.0	0.0	502.0	Derek Jacobi	2000.0	201148159.0	Drama\|Family\|Fantasy\|Romance	...	323.0	English	USA	PG	95000000.0	2015.0	520.0	7.0	2.35	56000
213	Color	Louis Leterrier	344.0	106.0	255.0	850.0	Jason Flemyng	14000.0	163192114.0	Action\|Adventure\|Fantasy	...	637.0	English	USA	PG-13	125000000.0	2010.0	1000.0	5.8	2.35	15000
2650	Color	Louis Leterrier	344.0	106.0	255.0	850.0	Jason Flemyng	14000.0	163192114.0	Action\|Adventure\|Fantasy	...	637.0	English	USA	PG-13	125000000.0	2010.0	1000.0	5.8	2.35	15000
2193	Color	John Milius	166.0	129.0	468.0	183.0	Mako	919.0	37567440.0	Adventure\|Fantasy	...	337.0	English	USA	R	20000000.0	1982.0	691.0	6.9	2.35	0
390	Color	John Milius	166.0	129.0	468.0	183.0	Mako	919.0	37567440.0	Adventure\|Fantasy	...	337.0	English	USA	R	20000000.0	1982.0	691.0	6.9	2.35	0
430	Color	George A. Romero	127.0	130.0	0.0	602.0	Hal Holbrook	875.0	NaN	Comedy\|Fantasy\|Horror	...	211.0	English	USA	R	8000000.0	1982.0	826.0	6.9	1.85	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2105	Color	Michael Bay	257.0	136.0	0.0	3000.0	Steve Buscemi	19000.0	35799026.0	Action\|Adventure\|Romance\|Sci-Fi\|Thriller	...	899.0	English	USA	PG-13	126000000.0	2005.0	12000.0	6.9	2.35	0
280	Color	Michael Bay	257.0	136.0	0.0	3000.0	Steve Buscemi	19000.0	35799026.0	Action\|Adventure\|Romance\|Sci-Fi\|Thriller	...	899.0	English	USA	PG-13	126000000.0	2005.0	12000.0	6.9	2.35	0
1805	Color	Jon Favreau	370.0	106.0	4000.0	591.0	Bill Murray	19000.0	362645141.0	Adventure\|Drama\|Family\|Fantasy	...	398.0	English	UK	PG	175000000.0	2016.0	13000.0	7.8	1.85	65000
79	Color	Jon Favreau	370.0	106.0	4000.0	591.0	Bill Murray	19000.0	362645141.0	Adventure\|Drama\|Family\|Fantasy	...	398.0	English	UK	PG	175000000.0	2016.0	13000.0	7.8	1.85	65000
1184	Color	John G. Avildsen	81.0	126.0	80.0	225.0	William Zabka	668.0	90800000.0	Action\|Drama\|Family\|Sport	...	235.0	English	USA	PG	8000000.0	1984.0	641.0	7.2	1.85	0
3351	Color	John G. Avildsen	81.0	126.0	80.0	225.0	William Zabka	668.0	90800000.0	Action\|Drama\|Family\|Sport	...	235.0	English	USA	PG	8000000.0	1984.0	641.0	7.2	1.85	0
2647	Color	Dennis Iliadis	241.0	114.0	29.0	616.0	Monica Potter	956.0	32721635.0	Crime\|Horror\|Thriller	...	279.0	English	USA	R	15000000.0	2009.0	878.0	6.6	1.85	0
4971	Color	Dennis Iliadis	241.0	114.0	29.0	616.0	Monica Potter	956.0	32721635.0	Crime\|Horror\|Thriller	...	279.0	English	USA	R	15000000.0	2009.0	878.0	6.6	1.85	0
337	Color	Peter Jackson	308.0	135.0	0.0	310.0	AJ Michalka	873.0	43982842.0	Drama\|Fantasy\|Thriller	...	593.0	English	USA	PG-13	65000000.0	2009.0	560.0	6.7	2.35	16000
4584	Color	Peter Jackson	308.0	135.0	0.0	310.0	AJ Michalka	873.0	43982842.0	Drama\|Fantasy\|Thriller	...	593.0	English	USA	PG-13	65000000.0	2009.0	559.0	6.7	2.35	16000
84	Color	Roland Joffé	10.0	109.0	596.0	283.0	Alice Englert	622.0	NaN	Action\|Adventure\|Romance\|Sci-Fi	...	15.0	English	Belgium	R	NaN	2015.0	525.0	4.5	NaN	677
1508	Color	Roland Joffé	10.0	109.0	596.0	282.0	Alice Englert	621.0	NaN	Action\|Adventure\|Romance\|Sci-Fi	...	15.0	English	Belgium	R	NaN	2015.0	525.0	4.5	NaN	677
4150	Color	Richard Donner	154.0	107.0	503.0	108.0	Patrick Troughton	264.0	NaN	Fantasy\|Horror\|Mystery	...	310.0	English	UK	R	2800000.0	1976.0	139.0	7.6	2.35	7000
1894	Color	Richard Donner	154.0	107.0	503.0	108.0	Patrick Troughton	264.0	NaN	Fantasy\|Horror\|Mystery	...	310.0	English	UK	R	2800000.0	1976.0	139.0	7.6	2.35	7000
3578	Color	Dan O'Bannon	203.0	108.0	228.0	358.0	Clu Gulager	431.0	NaN	Comedy\|Horror\|Sci-Fi	...	300.0	English	USA	R	4000000.0	1985.0	426.0	7.3	1.85	0
3896	Color	Dan O'Bannon	203.0	108.0	228.0	358.0	Clu Gulager	431.0	NaN	Comedy\|Horror\|Sci-Fi	...	300.0	English	USA	R	4000000.0	1985.0	426.0	7.3	1.85	0
4936	Color	Tobe Hooper	277.0	88.0	365.0	177.0	Edwin Neal	383.0	30859000.0	Horror\|Thriller	...	826.0	English	USA	R	83532.0	1974.0	371.0	7.5	1.85	0
3278	Color	Tobe Hooper	277.0	88.0	365.0	177.0	Edwin Neal	383.0	30859000.0	Horror\|Thriller	...	826.0	English	USA	R	83532.0	1974.0	371.0	7.5	1.85	0
3170	Color	Florian Henckel von Donnersmarck	321.0	103.0	207.0	3000.0	Angelina Jolie Pitt	40000.0	67631157.0	Action\|Romance\|Thriller	...	374.0	English	USA	PG-13	100000000.0	2010.0	11000.0	6.0	2.35	25000
305	Color	Florian Henckel von Donnersmarck	321.0	103.0	207.0	3000.0	Angelina Jolie Pitt	40000.0	67631157.0	Action\|Romance\|Thriller	...	374.0	English	USA	PG-13	100000000.0	2010.0	11000.0	6.0	2.35	25000
4681	Color	David S. Goyer	178.0	89.0	687.0	389.0	Atticus Shaffer	10000.0	42638165.0	Drama\|Fantasy\|Horror\|Mystery\|Thriller	...	230.0	English	USA	PG-13	16000000.0	2009.0	787.0	4.8	2.35	0
2546	Color	David S. Goyer	178.0	89.0	687.0	389.0	Atticus Shaffer	10000.0	42638165.0	Drama\|Fantasy\|Horror\|Mystery\|Thriller	...	230.0	English	USA	PG-13	16000000.0	2009.0	787.0	4.8	2.35	0
4009	Color	Akiva Schaffer	265.0	102.0	82.0	536.0	Nicholas Braun	622.0	34350553.0	Action\|Comedy\|Sci-Fi	...	207.0	English	USA	R	68000000.0	2012.0	591.0	5.7	2.35	15000
871	Color	Akiva Schaffer	265.0	102.0	82.0	537.0	Nicholas Braun	622.0	34350553.0	Action\|Comedy\|Sci-Fi	...	207.0	English	USA	R	68000000.0	2012.0	591.0	5.7	2.35	15000
1378	Color	Catherine Hardwicke	350.0	122.0	308.0	10000.0	Taylor Lautner	17000.0	191449475.0	Drama\|Fantasy\|Romance	...	1535.0	English	USA	PG-13	37000000.0	2008.0	12000.0	5.2	2.35	19000
2262	Color	Catherine Hardwicke	350.0	122.0	308.0	10000.0	Taylor Lautner	17000.0	191449475.0	Drama\|Fantasy\|Romance	...	1535.0	English	USA	PG-13	37000000.0	2008.0	12000.0	5.2	2.35	19000
3981	Color	Jaume Collet-Serra	349.0	113.0	174.0	767.0	Frank Langella	14000.0	61094903.0	Action\|Mystery\|Thriller	...	332.0	English	UK	PG-13	30000000.0	2011.0	902.0	6.9	2.35	29000
1232	Color	Jaume Collet-Serra	349.0	113.0	174.0	767.0	Frank Langella	14000.0	61094903.0	Action\|Mystery\|Thriller	...	332.0	English	UK	PG-13	30000000.0	2011.0	903.0	6.9	2.35	29000
2099	Color	Paul McGuigan	159.0	110.0	118.0	287.0	Spencer Wilding	11000.0	5773519.0	Drama\|Horror\|Sci-Fi\|Thriller	...	91.0	English	USA	PG-13	40000000.0	2015.0	1000.0	6.0	2.35	11000
1146	Color	Paul McGuigan	159.0	110.0	118.0	287.0	Spencer Wilding	11000.0	5773519.0	Drama\|Horror\|Sci-Fi\|Thriller	...	91.0	English	USA	PG-13	40000000.0	2015.0	1000.0	6.0	2.35	11000

	first_year	last_year	history_length	movie_number	movie_proportion
50 Cent	2005.0	2015.0	10.0	5.0	0.001016
A. Michael Baldwin	1988.0	1988.0	0.0	1.0	0.000203
A.J. Buckley	2000.0	2015.0	15.0	5.0	0.001016
A.J. DeLucia	2015.0	2015.0	0.0	1.0	0.000203
A.J. Langer	1998.0	1998.0	0.0	1.0	0.000203
AJ Michalka	2009.0	2011.0	2.0	2.0	0.000407
Aaliyah	2000.0	2002.0	2.0	2.0	0.000407
Aaron Ashmore	2004.0	2015.0	11.0	2.0	0.000407
Aaron Hill	NaN	NaN	NaN	1.0	0.000203
Aaron Hughes	2007.0	2007.0	0.0	1.0	0.000203
Aaron Kwok	2016.0	2016.0	0.0	1.0	0.000203
Aaron Stanford	2000.0	2003.0	3.0	4.0	0.000813
Aaron Staton	2007.0	2007.0	0.0	1.0	0.000203
Aaron Yoo	2007.0	2015.0	8.0	5.0	0.001016
Aasheekaa Bathija	2004.0	2004.0	0.0	1.0	0.000203
Aasif Mandvi	2008.0	2012.0	4.0	4.0	0.000813
Abbie Cornish	2007.0	2014.0	7.0	7.0	0.001423
Abby Elliott	2011.0	2011.0	0.0	1.0	0.000203
Abby Mukiibi Nkaaga	2006.0	2006.0	0.0	1.0	0.000203
Abhishek Bachchan	2005.0	2011.0	6.0	3.0	0.000610
Abigail Evans	2014.0	2014.0	0.0	1.0	0.000203
Abigail Spencer	2012.0	2014.0	2.0	4.0	0.000813
Abraham Benrubi	1997.0	2016.0	19.0	3.0	0.000610
Ace Marrero	2013.0	2013.0	0.0	1.0	0.000203
Adam Alexi-Malle	1999.0	1999.0	0.0	1.0	0.000203
Adam Arkin	1981.0	2012.0	31.0	4.0	0.000813
Adam Baldwin	1980.0	2005.0	25.0	6.0	0.001220
Adam Boyer	2006.0	2006.0	0.0	1.0	0.000203
Adam Brown	2012.0	2014.0	2.0	3.0	0.000610
Adam Butcher	2004.0	2004.0	0.0	1.0	0.000203
...	...	...	...	...	...
Zakes Mokae	1981.0	1995.0	14.0	2.0	0.000407
Zan Marolt	2007.0	2007.0	0.0	1.0	0.000203
Zane Holtz	2003.0	2003.0	0.0	1.0	0.000203
Zayn Malik	2013.0	2013.0	0.0	1.0	0.000203
Zelda Rubinstein	1982.0	1988.0	6.0	2.0	0.000407
Zen McGrath	2014.0	2014.0	0.0	1.0	0.000203
Zena Grey	2010.0	2010.0	0.0	1.0	0.000203
Zero Mostel	1966.0	1966.0	0.0	1.0	0.000203
Zhanna Friske	2004.0	2004.0	0.0	1.0	0.000203
Zheng Xu	2006.0	2009.0	3.0	2.0	0.000407
Zhengyong Zhang	2004.0	2004.0	0.0	1.0	0.000203
Zhonghua Chen	2006.0	2006.0	0.0	1.0	0.000203
Zinedine Soualem	1996.0	2010.0	14.0	2.0	0.000407
Zoe Kazan	2008.0	2015.0	7.0	5.0	0.001016
Zoe Levin	2010.0	2010.0	0.0	1.0	0.000203
Zoe Lister-Jones	2009.0	2009.0	0.0	1.0	0.000203
Zoe Ventoura	2014.0	2014.0	0.0	1.0	0.000203
Zoey Deutch	2013.0	2016.0	3.0	2.0	0.000407
Zohra Segal	2005.0	2005.0	0.0	1.0	0.000203
Zooey Deschanel	1999.0	2011.0	12.0	19.0	0.003863
Zoë Bell	2007.0	2015.0	8.0	3.0	0.000610
Zoë Kravitz	2007.0	2016.0	9.0	8.0	0.001626
Zoë Poledouris	2001.0	2001.0	0.0	1.0	0.000203
Zubaida Sahar	2003.0	2003.0	0.0	1.0	0.000203
Zuhair Haddad	2014.0	2014.0	0.0	1.0	0.000203
Álex Angulo	2014.0	2014.0	0.0	1.0	0.000203
Ángela Molina	2014.0	2014.0	0.0	1.0	0.000203
Émilie Dequenne	2009.0	2009.0	0.0	1.0	0.000203
Ólafur Darri Ólafsson	NaN	NaN	NaN	1.0	0.000203
Óscar Jaenada	2010.0	2016.0	6.0	3.0	0.000610

	movie_title	gross	gross_sales_rank
0	Avatar	760505847.0	1.0
26	Titanic	658672302.0	2.0
29	Jurassic World	652177271.0	3.0
17	The Avengers	623279547.0	4.0
66	The Dark Knight	533316061.0	5.0
240	Star Wars: Episode I - The Phantom Menace	474544677.0	6.0
3024	Star Wars: Episode IV - A New Hope	460935665.0	7.0
8	Avengers: Age of Ultron	458991599.0	8.0
3	The Dark Knight Rises	448130642.0	9.0
582	Shrek 2	436471036.0	10.0
3080	E.T. the Extra-Terrestrial	434949459.0	11.0
186	The Hunger Games: Catching Fire	424645577.0	12.0
13	Pirates of the Caribbean: Dead Man's Chest	423032628.0	13.0
509	The Lion King	422783777.0	14.0
43	Toy Story 3	414984497.0	15.0
32	Iron Man 3	408992272.0	16.0
439	The Hunger Games	407999255.0	17.0
27	Captain America: Civil War	407197282.0	18.0
161	Spider-Man	403706375.0	19.0
36	Transformers: Revenge of the Fallen	402076689.0	20.0

Cleaning: Cycle Share